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Abstract: The finite-sample as well as the asymptotic distribution of Leung 
and Barron's (2006) model averaging estimator are derived in the context of 
a linear regression model. An impossibility result regarding the estimation of 
the finite-sample distribution of the model averaging estimator is obtained. 



1. Introduction 



Model averaging or model mixin g es timators have received increased interest in 
recent years; see, e.g., Yang Il9l . l2(ij ]. Magnus fl3| . Leung and Barron (T3 |. 
and the references therein. [For a discussion of model averaging from a Bayesian 
perspective see Hoeting et al. [4].] The main idea behind this class of estimators is 
that averaging estimators obtained from different models should have the potential 
to achieve better overall risk performance when compared to a strategy that only 
uses the estimator obtained from one model. As a consequence, the above mentioned 
literature concentrates on studying the risk properties of model averaging estimators 
and on associated oracle inequalities. In this paper we derive the finite-sample as 
well as the asymptotic distribution (under fixed as well as under moving parameters) 
of the model averaging estimator studied in [12j; for the sake of simplicity we 
concentrate on the special case when only two candidate models are considered. 
Not too surprisingly, it turns out that the finite-sample distribution (after centering 
and scaling) depends on unknown parameters, and thus cannot be directly used for 
inferential purposes. As a consequence, one may be interested in estimators of this 
distribution, e.g., for purposes of conducting inference. We establish an impossibility 
result by showing that any estimator of the finite-sample distribution of the model 
averaging estimator is necessarily "bad" in a sense made precise in Section 4. While 
we concentrate on Leung and Barron's [l2| estimator (in the context of only two 
candidate models) as a prototypical example of a model averaging estimator in this 
paper, similar results will typically hold for other model averaging estimators (and 
more than two candidate models) as well. 

We note that results on distributional properties of post-model-selection esti- 
mators that parallel the development in the present paper have been obtained in 
[E 0, B E 14 > Hi HI 12 1 . See also Leeb and Potscher for impossibility re- 



sults pertaining to shrinkage- type estimators like the Lasso or Stein's estimator. An 
easily accessible exposition of the issues discussed in the just mentioned literature 
can be found in Leeb and Potscher |f|. 
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The only other paper we are aware of that considers distributional properties 
of model averaging estimators is Hjort and Claeskens Hjort and Claeskens 0] 
provide a result (Theorem 4.1) that says that - under some regularity conditions - 
the asymptotic distribution of a model averaging estimation scheme is the distribu- 
tion of the same estimation scheme applied to the limiting experiment (which is a 
multivariate normal estimation problem). This result is an immediate consequence 
of the continuous mapping theorem, and furthermore becomes vacuous if the esti- 
mation problem one starts with is already a Gaussian problem (as is the case in 
the present paper). 



2. The model averaging estimator and its finite-sample distribution 

Consider the linear regression model 

Y = X/3 + u 

where Y is n x 1 and where the n x k non-stochastic design matrix X has full 
column rank k, implying n> k. Furthermore, u is normally distributed N(0,a 2 I n ), 
< a 2 < oo. Although not explicitly shown in the notation, the elements of Y, 
X, and u may depend on sample size n. [In fact, the random variables Y and 
u may be defined on a sample space that varies with n.] Let P n ./3,o- denote the 
probability measure on K™ induced by Y , and let E ni/ 3 i<T denote the corresponding 
expectation operator. As in [12| . we also assume that a 2 is known (and thus is 
fixed). [Results for the case of unknown a 2 that parallel the results in the present 
paper can be obtained if a 2 is replaced by the residual variance estimator derived 
from the unrestricted model. The key to such results is the observation that this 
variance estimator is independent of the least squares estimator for (3. The same 
idea has been used in [7fl to derive distributional properties of post-model-selection 
estimators in the unknown variance case from the known variance case. For brevity 
we do not give any details on the unknown variance case in this paper.] Suppose 
further that k > 1, and that X and (3 are commensurably partitioned as 

X = [X\ : X 2 ] 

and P = [P'ijP-z]' where Xi has dimension ki > 1. Let the restricted model be 
defined as Mr — {p e M fe : Pi = 0} and let M v = R k denote the unrestricted 
model. Let P(R) denote the restricted least squares estimator, i.e., the k x 1 vector 
given by 

(XlXi)- 1 ^! 

0fe 2 xl 

and let P(U) = (X'X)~ 1 X'Y denote the unrestricted least squares estimator. Le- 
ung and Barron [12j | consider model averaging estimators in a linear regression 
framework allowing for more than two candidate models. Specializing their estima- 
tor to the present situation gives 

(1) P = X$(R) + (1 - X)p(U) 

where the weights are given by 



p(R) = 



A = [cxp(-ar(R)/a 2 ) + exp(-af( C/)/^ 2 )]" 1 exp(-af( J R)/cj 2 ). 
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Here a > is a tuning parameter (note that Leung and Barron's tuning parameter 
corresponds to 2a) and 



~(R) = Y'Y - j3(R)'X'X(3(R) + cr 2 (2k 1 - n) 



and 



r( [/) = Y'Y - f3( U)'X'X(3( U) + a 2 {2k - n). 
For later use we note that 

A = [1 + cxp(-2a£;2) exp(- a (/§(#)' X'X[3(R) - $( U)'X'Xf}{ U))/^ 2 )}- 1 
= [1 + exp(-2afc 2 ) exp(a X j3(R) - X (3(U) /a 2 )]" 1 



where 



denotes the Euclidean norm of a vector 



x, i.e., ||x|| = (x'x) 1 ^ 2 . Leung 
and Barron 12j establish an oracle inequality for the risk E ni| g )Cr (||X(/3 — f3)\\ 2 ) 
and show that the model averaging estimator performs favourably in terms of this 
risk. As noted in the introduction, in the present paper we consider distributional 
properties of this estimator. Before we now turn to the finite-sample distribution 
of the model averaging estimator we introduce some notation: For a symmetric 
positive definite matrix A the unique symmetric positive definite root is denoted 
by A 1 ! 2 . The largest (smallest) eigenvalue of a matrix A is denoted by A max (A) 
(A m ; n (^4)). Furthermore, Pa and Pjj denote the projections on the column space of 
X\ and of X, respectively. 

Proposition 1. The finite-sample distribution of \fn(@ — (3) is given by the distri- 
bution of 



B n ^/n(3 2 + Cn^fnZi 



{X' 2 {I - P R )X 2 fl^ 2 



(3) 1 + cxp(2a/c2) exp y— a 
{D n y/nZ 2 - -B„Vn/3 2 } 

which can also be written as 

C n \fnZ\ + D n y/nZ 2 - 

(4) 1 + cxp(-2afc 2 ) exp (a |z 2 + (X' 2 (I - Pr)X 2 ) 1 ' 2 f3 2 \^ jo 2 
{D n yfnZ 2 - B n y/n/3 2 }. 



Here 



B n = 
D n = 



(X[Xi) 1 X' 1 X 2 _ (X[Xi) 

T ' n — O 

-(X^X^X'^X'^I - p r )x 2 )-v 2 
- P R )X 2 )-V* 



1/2 



and Z\ and Z 2 are independent, Z\ ~ N(0, cr 2 /fc 1 ), and Z 2 ~ N(0,a 2 Ik 2 )- 
Proof. Observe that 



= (3(R) + (1 - X)0(U) - $(R)) = $(R) + (1 - X^X'xy'X'iPu - P R )Y 
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with P R = Xi(X' 1 Xi)- 1 X[ and P v = X{X'X)-' l X l . Diagonalize the projection 
matrix Pjj — Pr as 

Pu~Pr= UNA' 
where the orthogonal n x n matrix U is given by 

U = [W 1 ,W 2 ,W 3 ] = [x^X'^y 1 / 2 : {I - P R )X 2 {X' 2 {I - Pr)X 2 )- 1 ' 2 : U 3 

with U3 representing an n x (n — k) matrix whose columns form an orthonormal 
basis of the orthogonal complement of the space spanned by the columns of X. 
The n x n matrix A is diagonal with the first k\ as well as the last n — k diagonal 
elements equal to zero, and the remaining k 2 diagonal elements being equal to 1. 
Furthermore, set V = U'Y which is distributed N(U' X(3,a 2 I n ). Then 



Xf3(U) - X/3(R) 



\{Pu-Pr)Y\\ 



|AV|| 



\V 2 



where V 2 is taken from the partition of V — (V{, V%, V£)' into subvectors of di- 
mensions fci, k 2 , and n — k, respectively. Note that V 2 is distributed N((X' 2 (I — 
PR)X 2 ) 1/2 (3 2 ,a 2 I k2 ). Hence, in view of © we have that (1 - X)0(U) - (3(R)) is 
equal to 



1 + cxp(2afc 2 ) exp (—a \\V 2 \\ 2 /< 

1 + exp(2afc 2 ) exp (^—a \\V 2 \\ 2 /a 
1 + exp(2afc 2 ) exp a \\V 2 \\ 2 ja 



{X'X)~ X X'UISV 

-1 



(X'X) 



D n V 2 . 



Ofc lX l 

X' 2 U 2 V 2 



Furthermore, 



j3{R) = {X'X^X'PrY 
= {X' X)- 1 X' PrUV 

= (x'x^x'x^x'^)- 1 /^ 

' {X^Xr)- 1 /^' 

Ofc 2 xl 



= c n v x 



with Vi distributed 7V((Jf( Jfi)- 1 / 2 ^^, a 2 I kl )• Hence, the finite sample distri- 
bution of (3 is the distribution of 



(5) 



C n V! + 



1 + exp(2afe) exp (-a \\V 2 \\ 2 /a' 



D n V 2 



where V\ and V 2 are independent normally distributed with parameters given above. 
Defining Zi as the centered versions of Vi, subtracting (3, and scaling by ^fn then 
delivers the result. □ 

Remark 2. (i) The first two terms in ((3]) represent the distribution of y/n(/3(R)—(3), 
whereas the third term represents the distribution of (1 — A)y / n(/3([7) — (3(R)). In 
(U|), the first two terms represent the distribution of y/n(/3(U) — (3), whereas the 
third term represents the distribution of — Ay / n(/3([7) — (3(R)). 
(ii) If (3 2 = then §5§ can be rewritten as 



CnVnZ! + \\Z 2 \\ 1 + exp(2afc 2 ) exp (-a \\Z 2 



l> 2 



D n yfr(Z 2 / \\Z 2 \\) 
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showing that this term has the same distribution as 

C n y/^Zi + vG?[l + exp(2afc 2 ) exp (- ay 2 j a 2 )]- 1 D n ^fhlJ 

where \ 2 is distributed as a \ 2 with k 2 degrees of freedom, U = Z 2 / \ \Z 2 \\ is uni- 
formly distributed on the unit sphere in M. k2 , and Z\ , \ 2 > an d U are mutually 
independent. 

Theorem 3. The finite-sample distribution of \/n(/3 — (3) possesses a density f n ,/3,a 
given by 

f n ,p t<r (t) = (2na 2 )- k / 2 [det(X'X/n)] 1 / 2 

x exp (-(2a 2 )- 1 jn-^X'M^h + ^^(X 1 ^)- 1 ' 2 X[X 2 t, 

x 1 + exp ( -a<7- 2 g ( n^^D^fa + n 1 ' 2 f3 2 ) ) + 2ak 2 ) 
(6) x jl + 2aa- 2 g i^n- 1 ' 2 D; i2 \t 2 + n 1/2 /3 2 )||) 2 

x 1 + exp (ao-- 2 g ( n^^D' 1 (t 2 + n^ 2 (3 2 ) ) 2 - 2ak 2 
x exp (-(2a 2 )- 1 ||g (jn-^D^fa + n 1 ^f3 2 )\\) 

-^D-Hh + n 1 ' 2 ^ n^ 2 D- 2 \t 2 + n 1 / 2 ^) - D^fo) 

where t is partitioned as (t^, t^)' with t\ being a k\ x 1 vector. Furthermore, D n2 = 
(X 2 (I — PiijX'i)- 1 / 2 , and g is as defined in the Appendix (with a — exp(2afc 2 ) and 
b = a- 1 a 2 ). 

Proof. By (JS|) we have that the finite-sample distribution of i/rj(/3 — (3) is the dis- 
tribution of 

-VH0 + Vn[a. :D n ][V(: Vfl 



V 2 . 



where 

V 3 = l + cxp(2a/c 2 )exp(-a||T/ 2 || 2 /<7 2 
By Lemmata [15] and [16] in the Appendix it follows that V3 possesses the density 

hi 



tp(v 3 ) = (2ira 2 )- k2/2 \1 + exp i^-aa^g (\\v 3 \\) 2 + 2ak 2 

x ^l + 2aa- 2 g(\\v 3 \\) 2 [l + exp [aa' 2 g (||w 3 ||) 2 - 2a/c 2 ) 
x exp (-(2a 2 )- 1 g (\\v 3 \\)v 3 / \\v 3 \\ - (X' 2 (I - P R )X 2 )^ 2 P; 



Since V\ is independent of V 2 , and hence of V3, the joint density of [V{ : V 3 ]' exists 
and is given by 

(2 7 r ( 7 2 )" fcl / 2 exp{~(2 ( 7 2 )- 1 v 1 -(X[X 1 )- 1 / 2 X' 1 XP 2 }^(v 3 ). 
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Since the matrix [C n : D n ] is non-singular we obtain for the density of \fn((3 — [3) 

{2na 2 )- kl/2 n- k ' 2 [det{X' l X 1 ) det(X^(7 - Pr)X 2 )] 1 ' 2 
x exp (-(2a 2 )- 1 L^iX^) 1 / 2 ^ + n l ' 2 (3 1 ) 

+ n-^iX'^r^X^ih + n 1 ' 2 ^) - (X[X 1 )-^ 2 X[X0f^j 

x tf, (n- x ' 2 {X' 2 {I P R )X 2 )^ 2 (t 2 + n 1 / 2 ^)) . 

Note that det(X(Xi) det(X£(J - Pr)X 2 ) = det(X'X). Using this, and inserting 
the definition of tp, delivers the final result (J6j> . □ 

Remark 4. From Proposition [1] one can immediately obtain the finite-sample dis- 
tribution of Y / nA„(/3— /3) by premultiplying ([3]) or ([¥]) by A n . Here A n is an arbitrary 
(nonstochastic) p n x k matrix. If A n has full row-rank equal to k (implying p n = k), 
this distribution has a density, which is given by det(A„) _1 /„.^ j0 .(A- 1 s), s S K fc . 



3. Asymptotic properties 



For the asymptotic results we shall - besides the basic assumptions made in the 
preceding section - also assume that 



(7) 



lim X'X/n = Q 



exists and is positive definite, i.e., Q > 0. We first establish "uniform -^/n-consisten- 
cy" of the model averaging estimator, implying, in particular, uniform consistency 
of this estimator. 

Theorem 5. Suppose ^ holds. 

1. Then (3 is uniformly ^Jn- consistent for (3, in the sense that 



(8) 



lim sup sup P n ,p.a ( \fn 

M->OQ „> fc &R k ' ' \ 



(3-/3 



> At 



Consequently, for every e > 
(9) Jim sup P n>l a, CT (c n 



(3-(3 



> £ 







holds for any sequence of real numbers c„ > satisfying c n — o{n 1 l 2 ); which 
reduces to uniform consistency for c n — 1 . 
2. The results in Part 1 also hold for A n (3 as an estimator of A n j3, where A n are 
arbitrary ( nonstochastic ) matrices of dimension p n x k such that the largest 
eigenvalues \ma.x(A' n A n ) are bounded. 

Proof. We prove ([5]) first. Rewrite the model averaging estimator as (3 = (3(U) + 
X0(R) -(3(U)). Since 



P-(3 



< 



(3{U)-p 



+ 



(3{R)~(3(U) 



since 



,/3,<r(Vn (3(U)-f3 > At) < M _ Vtrace[(X'X/n) -1 ], 
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and since trace[(X' X / 'n)~ 



trace[<5 1 ] < oo, it suffices to establish 



(10) lim SUp SUp Pn,/3,cr 

M-kx> n > fe pgfrk 



$(R)-$(U) 



> M = 



Now, using ([2]) and the elementary inequality z 2 /[l + cexp(z 2 )] 2 < c 2 we have 

2 



A 



(3{R)-p{U) 



< \ 2 \-{ n (X'X) Xf3(R)-X(3(U) 



(11) 



l + exp(-2a/c 2 )exp [a X(3{R) - X(3(U) /a' 



1 -2 



XP(R) - X(3{ U) 
< n- 1 A i ;J n (X'X/n)a- 1 cr 2 exp(4a/c 2 ) < KrC x o 2 

for a suitable finite constant K, since X m i n (X'X/n) — > A m i n (Q) > 0. This proves 
(fT0|) and thus completes the proof of JHJ. The remaining claims in Part 1 follow 
now immediately. Part 2 is an immediate consequence of Part 1, of the inequality 



A n /3 - A n j3 



5-- A max (j4 n j4 ri 



0-/3 



and of the assumption on A max (A' l y4„). 



□ 



Remark 6. (i) The proof has in fact shown that the difference between (3 and 
f3(U) is bounded in norm by a deterministic sequence of the form const * cm -1 / 2 . 

(ii) Although of little statistical significance since a 2 is here assumed to be known, 
the proof also shows that the above proposition remains true if a supremum over 
< a 2 < S, (0 < 5 < oo) is inserted in © and ©. 

In the next two theorems we give the asymptotic distribution under general 
"moving parameter" asymptotics. Note that the case of fixed parameter asymptotics 
(fiW = p) as well as the case of the usual local alternative asymptotics (/3^ n ' — 
(3 + 5/\fn) is covered by the subsequent theorems. In both these cases, Part 1 of 
the subsequent theorem applies if f3 2 ^ 0, while Part 2 with 7 = and 7 = <5 2 , 
respectively, applies if fa = 0. 

Theorem 7. Suppose ^ holds. 

1. Let [3^ be a sequence of parameters such that HV^^II ~~ * 00 as n ~ * 00 • 
Then the distribution of ^/n{(3 — (3^) under P n g( n ) a converges weakly to a 
N(0, a 2 Q^ 1 )- distribution. 



2. Let [3^ be a sequence of parameters such that ^/nf3 2 1 ^ — > 7 



as n 



00. 



Then the distribution of \fn{j3 — (3^) under P n>) 3(n) a converges weakly to the 
distribution of 



(12) 



-Boo 7 + CooZx 



1 + cxp(2a/c 2 ) exp (-a Z 2 + (Q 22 - Q 2 iQ^Qi 2 ) 1/2 j jo' 



x {D oc Z 2 - B^} 
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where 



Co 



Q11Q12 



-QiiQi2(Q22 — Q21Q11Q12) 



0fc 2 xfei 



(Q22 — Q21Q11 Q12) 



-1/2 



and where Z\ ~ N(0,a 1^) is independent of Z 2 ~ iV(0,cr 2 /fc 2 ). The density 
of the distribution of M2\) is given by 



/o , 7 W = (2^ 2 )- fc/2 [det(Q)] 1 / 2 



x exp — (2er 



Ql{ 2 tl + Qii^Ql 2 t 2 



k 2 



(13) 



x |l + exp (-ckj 2 g(\\D ao 1 2 (t 2 +"f)\\) +2ak 2i 
x {l + 2 a a- 2 .g(||^ 1 2 (i 2+7 )||) 2 

x 1 + cxp (aa- 2 g (||-D~2(*2 + 7)||) 2 - 2afc 2 

xexp{-(2a 2 )- 1 | 5 (||^ 1 2 (t 2 + 7 )||)||^ 1 2 (i 2+7 )f 1 

x ^ 2 (<2 + 7)-^ 2 7H 2 }, 

where t is partitioned as (t'x,t' 2 )' *i & e * n <7 a fci X 1 vector. Furthermore, 
D002 — (Q22 — Q2iQi\Q\2) 1 ^ 2 , and g is as defined in the Appendix (with 
a = cxp(2a/c 2 ) and b = aT^a 1 ). 

Proof. To prove Part 1 represent y/n0 - fiW) as y/n(/3( U) - + \^/n~0{R) - 
(3{U)). The first term is iV(0, CT 2 (X'X/7i) _1 )-distributed under P n>(8 (n) i<n which ob- 
viously converges to a N(0, a 2 <5 _1 )-distribution. It hence suffices to show that 
\\fn{f3{R) — $(U)) converges to zero in P„ ^-probability. Since \ m l n (X'X/n) 
is bounded by assumption (O and since 



yfc0(R) - 0(U)) <n\^ n (X'X) X[3(R)-XP(U) 



1 + exp aa 



X(3(R) -XP(U) 



as shown in (fTTj) . it furthermore suffices to show that 



(14) 

Note that 



Xf3(R)-Xf3(U) 



00 m 



-probability. 



X/3(R)-Xf3(U) =\\(Pu-P R )Y\\ 



(Pu - P R )u + (Pu - P R )X 2 f3, 



(n) 



> (Pu - P R )X 2 (3 [ 2 n) -\\(Pu-P R )u 
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The second term satisfies E, 



(Pu — Pr)u\\ 2 = a 2 k2 and hence is stochastically 



bounded in P n «(«) ^-probability. The square of the first term, i.e. 



equals 



(Pu - P R )X 2 p { 2 n) 



^ n) '[(X 2 X 2 /n) - {X' 2 X 1 /n){X[X 1 /n)-\X[X 2 /n)]^l3 { 2 n) - 



Since the matrix in brackets converges to Q22 — Q21Q11Q12, which is positive def- 
inite, the above display diverges to infinity, establishing ([T?]) . This completes the 
proof of Part 1. 

We next turn to the proof of Part 2. The proof of (fT2")l is immediate from ^ 
upon observing that B n — » B^, \fnC n — > C^, and y/nD n — > D^. To prove (flU)) 
observe that ([T2"]) can be written as 

■Boo 7 + CooZi 

+ 1 + cxp(2a£; 2 ) cxp (-a Z 2 + {Q 22 - Q 2 iQ^Qi 2 ) 1/2 l 2 /a' 
x {#00(^2 + (Q22 - Q2iQnQi2) 1/2 7)} 



O 



+ cxp(2ak 2 ) exp (-a ||W 2 || 2 /a 1 



VF 2 



where W 2 ~ N((Q 22 — Q 2 iQ l iQi 2 ) 1 ^ 2 "l,cr' 2 I k2 ) is independent of Z x . Again using 
Lemmata [TS] and [TO] in the Appendix gives the density of 



W* = 



w 2 



a.s 



1 + cxp(2a/c 2 ) exp (—a || W2 1| 2 /<r 

X ( W3 ) = (2na 2 )- k >/ 2 [1 + cxp (-a^ 2 g(\\w 3 \\) 2 + 2ak 2 )] k2 

x [l + 2 a( r- 2 5 (|| W 3||) 2 [l + exp(aa- 2 5 (||u; 3 ||) 2 -2afc 2 )] _1 } 

x exp (-(2a 2 )- 1 jg (|K|IW \\w 3 \\ - (Q 22 - Q 2 iQu Qi 2 ) 1/2 7 

Since Zi is independent of Z 2l and hence of W3, the joint density of [Z[ : W3]' 
exists and is given by 

(27nr 2 r fc1 / 2 exp (-(2a 2 )- 1 \\ Zl f) X W- 

Since the matrix [Coo : -Doo] is non-singular we obtain finally 

(2vr ( 7 2 )- fe1 / 2 [det(Q u ) dct(g 22 - QaiQ^Oia)] 1/2 
x exp (-(2a 2 )- 1 |qJ( 2 (h - Q^Qi 2l ) + Q n 1/2 Qi 2 (h + 

x X ((Q22 - Q 2 iQ n 1 Qi2) 1/2 (i2 + 7)) ■ 
Inserting the expression for x derived above gives ([13]). □ 
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Since in both cases considered in the above theorem the limiting distribution is 
continuous, the finite-sample cumulative distribution function (cdf) 

F Bi/J( ») i<r (t) = P n;/3W)CT - (3 {n) ) < i) 

converges to the cdf of the corresponding limiting distribution even in the sup-norm 
as a consequence of the multivariate version of Polya's Theorem (cf. Q, Ex.6, 0). 
We next show that the convergence occurs in an even stronger sense. Let foo denote 
the density of the asymptotic distribution of \fn{[3 — (3^ n >) given in the previous 
theorem. That is, foe is equal to /oo, 7 given in (fl~3|) if ^/nf3^ — > 7 S K fe2 , and is 
equal to the density of an N(0, cr 2 Q _1 )-distribution if Hv^/^™''!! — > 00 • For obvious 
reasons and for convenience we shall denote the N(0, <7 2 (3 _1 )-density by /oo,oo- 

Theorem 8. Suppose the assumptions of Theorem^ hold. Then the finite-sample 
density f n p(.n) i(T of \fn(J3 — /3^ n ') converges to foo, the density of the correspond- 
ing asymptotic distribution, in the L 1 -sense. Consequently, the finite-sample cdf 
F n p( n ) a converges to the corresponding asymptotic cdf in total variation distance. 

Proof. In the case where y^n/J^ — > 7 £ R fe2 , inspection of ©, and noting that 
g as well as T^ 1 given in Lemma 1151 are continuous, shows that ([6]) converges to 
(|13p pointwise. In the case where ||y / n/32™' ) || — > 00, Lemma [T71 in the Appendix and 
inspection of © show that (O converges pointwise to the density of a N(0, o 2 Q~ 1 )- 
distribution. Observing that f n a as well as /oo are probability densities, the 
proof is then completed by an application of Scheffe's lemma. □ 

Remark 9. We note for later use that inspection of (fl3|) combined with Lemma [T7l 
in the Appendix shows that for ||7|| ->oowe have /oo )7 — > f 00,00 (the N(0, fj 2 Q -1 )- 
density) pointwise on and hence also in the Z^-sense. As a consequence, the 
corresponding cdfs converge in the total variation sense to the cdf of a N(0, cr 2 Q -1 )- 
distribution. 

Remark 10. The results in this section imply that the convergence of the finite- 
sample cdf to the asymptotic cdf does not occur uniformly w.r.t. the parameter (3. 
[Cf. also the first step in the proof of Theorem 13 below.] 

Remark 11. Theorems [7] and |8] in fact provide a characterization of all accumu- 
lation points of the finite sample distribution F n g(n) CT (w.r.t. the total variation 
topology) for arbitrary sequences (3^ n \ This follows from a simple subsequence ar- 
gument applied to ^Jn^f 1 an d observing that (K.U {—00, oo}) fc2 is compact; cf. also 
Remark 4.4 in [7|. 

Remark 12. Part 1 of Theorem [7| as well as the representation p^|) immediately 
generalize to y/nA((3 — (3) with A a non-stochastic p x k matrix. If A has full row- 
rank equal to k, the resulting asymptotic distribution has a density, which is given 
bydet{Ar 1 foo(A- 1 s),seR k . 

4. Estimation of the finite-sample distribution: an impossibility result 

As can be seen from Theorem |3j the finite-sample distribution depends on the 
unknown parameter /?, even after centering at (3. Hence, it is obviously of interest 
to estimate this distribution, e.g., for purposes of conducting inference. It is easy 
to construct a consistent estimator of the cumulative distribution function F n ^^ a 
of the scaled and centered model averaging estimator (3, i.e., of 
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To this end, let M be an estimator that consistently decides between the restricted 
model Mr and the unrestricted model Mu, i.e., linin^oo F nt p >a (M = Mr) = 1 if 
02 = and lim n _>oo P nil g iCT (M = Mtj) = 1 if f$i ^ Q. [Such a procedure is easily 
constructed, e.g., from BIC or from a i-test for the hypothesis 02 = with critical 
value that diverges to infinity at a rate slower than n 1 / 2 .] Define /„ equal to ^ 

the density of the Af(0, <7 2 (X'X/n) _1 )-distribution, on the event M — Mjj, and 
define /„ equal to otherwise, where follows the same formula as /oo,0) 
with the only exception that Q is replaced by X' X/n. Then - as is proved in the 
Appendix - 

(15) / \f n {z)-f ni0>(T (z)\dz^O 

in P„ i( 3 j(7 -probability as n — > oo for every E R k . Define F n as the cdf corresponding 
to f n . Then for every 6 > 

Pn,/3, CT (\\F n ~ Fn^WrV > S ) ^ 

as n — ► oo, where denotes the total variation norm. This shows that F n is a 

consistent estimator of F n ,fi,tT in the total variation distance. A fortiori then also 



P„^, CT (^sup \F n (t) - F„,0, ff (t)| > <JJ -y 

holds. 

The estimator F n just constructed has been obtained from the asymptotic cdf by 
replacing unknown quantities with suitable estimators. As noted in Remark 10, the 
convergence of the finite-sample cdf to their asymptotic counterpart does not occur 
uniformly w.r.t. the parameter 0. Hence, it is to be expected that F n will inherit 
this deficiency, i.e., F n will not be uniformly consistent. Of course, this makes it 
problematic to base inference on F n , as then there is no guarantee - at any sample 
size - that F n will be close to the true cdf. This naturally raises the question if 
estimators other than F n exist that are uniformly consistent. The answer turns out 
to be negative as we show in the next theorem. In fact, uniform consistency fails 
dramatically, cf. (fT7|) below. This result further shows that uniform consistency 
already fails over certain shrinking balls in the parameter space (and thus a fortiori 
fails in general over compact subsets of the parameter space), and fails even if 
one considers the easier estimation problem of estimating F n ,fi,o only at a given 
value of the argument t rather than estimating the entire function F n ^^ a (and 
measuring loss in a norm like the total variation norm or the sup- norm). Although 
of little statistical significance, we note that a similar result can be obtained for the 
problem of estimating the asymptotic cdf. Related impossibility results for post- 
model-selection estimators as well as for certain shrinkage-type estimators are given 
in 0,0, E|- 



In the result to follow we shall consider estimators of F n ^ i(7 (t) at a fixed value 
of the argument t. An estimator of F n ^ i(T {t) is now nothing else than a real- valued 
random variable T n — T n (Y, X). For mnemonic reasons we shall, however, use the 
symbol F n (t) instead of T n to denote an arbitrary estimator of F ni p 1<T (t). This no- 
tation should not be taken as implying that the estimator is obtained by evaluating 
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an estimated cdf at the argument t, or that it is constrained to lie between zero 
and one. For simplicity, we give the impossibility result only in the simple situation 
where k 2 = 1 an d Q is block-diagonal, i.e., X\ and X 2 are asymptotically orthog- 
onal. There is no reason to believe that the non-uniformity problem will disappear 
in more complicated situations. 

Theorem 13. Suppose fiTty holds. Suppose further that k 2 = 1 and that Q is block- 
diagonal, i.e., the k\ x k 2 matrix Q\ 2 is equal to zero. Then the following holds for 
every (3 £ Mr and every t £ M. k : There exist So > and po, < po < oo, such that 
any estimator F n (t) of F n ^^{t) satisfying 



(16) 



F, 



n,{3,a 



(*) 



> s 







for every 5 > (in particular, every estimator that is consistent) also satisfie 



(17) 



sup 1 

\\S-I3\\<p /V^ 



.,■&,<* ( F n (t) — F n ^^(t) > S ^j 



1. 



The constants 5q and pa may be chosen in such a way that they depend only on t, 
Q, a, and the tuning parameter a. Moreover, 



(18) liminf inf sup 

\\V-P\\<Po/V^ 



r(t) 



>So)>0 



(19) sup liminf inf sup I 

<5>0 n ^°° F„(t) tf6 »* 



where the infima in U8\) and U9\) extend over all estimators F n (t) of F n ^^{t). 

Proof. Step 1 : Let (3 £ Mr and t £ K fe be given. Observe that by Theorems [7] and 
EH the limit 



Foo,~f(t) := limF n 



0+(v,-y)' 



/Vn,<r(*) 



exists for every r\ £ M. kl , 7 £ K fe 



M, and does not depend on 77. We now show 
that Foo )7 (i) is non-constant in 7 € K. First, observe that by Remark [9] and the 
block-diagonality assumption on Q 

lim^Foo^t) = P (q u 1/2 2i < ti) P (Q^ 172 ^ < t 2 ) 

where Zi and Z 2 are as in Theorem^ i is partitioned as {ti,t 2 )' with t 2 a scalar, 
and P is the probability measure governing (Z^Z?)'. Second, we have from (fT2l) 
and the block-diagonality assumption on Q that F ao , 1 {t) is the product of 



Q11 2 Zi <ti 



with 



(20) 



1 + exp(2a) exp ( -a ( Z 2 + Q^l) 1° 



(q 22 1/2 Z 2 + 1 ) - 7 <ta) • 



Model averaging estimators 



125 



— 1/2 

Since P(Q n Zi < ti) is positive and independent of 7, it suffices to show that 

(f20|) differs from ¥(Q 22 Z 2 < £2) f° r at least one 7 € R. Suppose first that £2 > 0. 
Then specializing to the case 7 = in (|2"01) it suffices to show that 

(21) P(Jl + exp(2a)exp(-aZ 2 7(7 2 )]~ 1 Q 22 1/2 Z 2 < t 2 

1 y 2 

differs from P(Q 22 ^2 < £2)- But this follows from 

'[l + exp(2a) exp (-a^/a 2 )] -1 Q 22 /2 Z 2 < *a 

1/2, 



= 1/2 +P [Z 2 > 0,h{Z 2 ) < Q 22 t 



1/2 + P ( < Z 2 < g ( Q 1 2 / 2 2 t 2 



> l/2 + P(0 < Z 2 < Ql f 2 2 t 2 



Q: 



-1/2 



Z 2 < fc 



since /i as defined in the Appendix (with a = exp(2a) and b = a 2 /a) is strictly 
monotonically increasing and satisfies h(x) < x for every x > 0, which entails 
5(2/) > 2/ f° r every y > 0. For symmetry reasons a dual statement holds for t 2 < 0. 
It remains to consider the case t 2 = 0. In this case (|20[) equals 



(22) 



1 + exp(2a) exp [ -a ( Z 2 + QjjM / ff2 



• U2 + Qaf 7) < Q22 2 7 



Let 7 > be arbitrary. Then (f2"2"]) equals 

P (^a + Q^-y < 0) + P (Z 2 + Qli 2 ^ >0,h (Z 2 + Q^ 2 7 ) < g 2 / 2 7 
Arguing as before, this can be written as 

P (Z 2 + Q^ 2 7 < 0) + P (0 < Z 2 + Qli 2 ^ < g (Qli 2 j) ) 
> P (z 2 + Q^ 2 7 < 0) + P (0 < Z 2 + Q^ 2 7 < Q 2 / 2 7 ) 

= p(z 2 <o) = p(q 22 1/2 z 2 <o) 

which completes the proof of Step 1. 

Step 2: We prove (fTTll and (fl8|) first. For this purpose we make use of Lemma 3.1 
in Leeb and Potscher [ll| with the notational identification a — (3 6 M^j, i? = M. k , 
B n = {1} € K fc : ||0 - /3|| < pan- 1 / 2 }, <p n {-) = F n ,.,„(t), and 0„ = F„(i), where 
po will be chosen shortly. The contiguity assumption of this lemma is obviously 
satisfied; cf. also Lemma A.l in [11]. It hence remains to show that there exists a 
value of poi < Pa < 00, such that S* defined in Lemma 3.1 of Leeb and Potscher 
11], which represents the limit inferior of the oscillation of F nt . t<T (t) over B n , is 
positive. Applying Lemma 3.5(a) of Leeb and Potscher [ll[ with £„ = p^rT 1 ! 2 and 
the set G equal to G = {(77', 7)' S M fe : ||(»/,7)'| < 1}, it suffices to show that 
Foo, 7(0 viewed as a function of (77', 7)' is non-constant on the set {(77', 7)' € R fe : 
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IK 7 /; 7)11 < Po}; in view of Lemma 3.1 of Leeb and Potscher [ll[, the corresponding 
5o can then be chosen as any positive number less than one-half of the oscillation 
of Foo^^t) over this set. That such a po indeed exists now follows from Step 1. 
Furthermore, observe that -Foo, (i) depends only on a, Q, a, and t. Hence, Sq an( i Po 
may be chosen such that they also only depend on these quantities. This completes 
the proof of (fT7|) and (fT5)| . 



To prove (|T9]) we use Corollary 3.4 in [llj with the same identification of no- 
tation as above, with £„ = pon^ 1 / 2 , and with V = M fe . The asymptotic uniform 
equicontinuity condition in that corollary is then satisfied in view of 



TV 



<2<S>(\\e-ti\\\^ x (X'X)/(2*)) 



cf. Lemma A.l in [llj. Given that the positivity of 6* has already been established 



in the previous paragraph, applying Corollary 3.4 in [llj then establishes (fT9|) . □ 

Remark 14. The impossibility result given in the above theorem also holds for 
the class of randomized estimators (with P n .-.a replaced by P* . a , the distribution 
of the randomized sample). This follows immediately from Lemma 3.6 in [11| and 
the attending discussion. 

Appendix A: Some technical results 

Let the function h : [0, oo) — > [0, oo) be given by h(£) = [l + aexp(— £ 2 /6)] _1 £ where 
a and b are positive real numbers. It is easy to see that h is strictly monotonically 
increasing on [0, oo), is continuous, satisfies h(0) = and lim^oo h(£) = oo. The 
inverse g : [0, oo) — > [0, oo) of h clearly exists, is strictly monotonically increasing 
on [0, oo), is continuous, satisfies g(0) — and lim^oo g(() = oo. In the following 
lemma we shall use the natural convention that ff(||y||)2//||2/|| = for y = 0, which 
makes y — > 5(||y||)j//||j/|| a continuous function on all of R m . 

Lemma 15. Let T : M m -» M m be given by 

r 2 -i-i 

T(x) = 1 + aexp(— ||x|| /b) x 

where a and b are positive real numbers. Then T is a bijection. Its inverse is given 
by 

T-\y) = 9 (\\y\\)y/\\y\\ 

where g has been defined above. Moreover, T" 1 is continuously partially differen- 
tiable and \\T~ 1 (y)\\ — g(\\y\\) holds for all y. 

Proof. If y = it is obvious that T(T (y)) = = y in view of the convention 
made above. Now suppose that y ^ 0. Then 

T{T-\y)) = [1 + aexp (-g(\\y\\) 2 /b)}- 1 g(\\y\\)y/ \\y\\ 

= h{g(\\y\\))y/\\y\\ = y. 

Similarly, if x = then T- 1 {T{x)) = 0. Now suppose x ^ 0. Then T(x) ^ and, 
observing that ||T(a;)|| = [1 -I- a. exp( — 1| a?|| 2 /6)] ~ 1 1 1 a; 1 1 , we have 

T- 1 (T(x))=g(\\T(x)\\)T(x)/\\T(x)\\ 

= g ( 1 + aexp ^— ||x|| 2 /bj \\x\\^j x / \\x\\ 
= g(h(\\x\\))x/ \\x\\ = x. 
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That T" 1 is continuously partially diffcrcntiable follows from the corresponding 
property of T and the fact that the determinant of the derivative of T does never 
vanish as shown in the next lemma. The final claim is obvious in case y =/= 0, 
and follows from the convention made above and the fact that g(Q) = in case 

V = o. □ 

Lemma 16. Let T be as in the preceding lemma. Then the determinant of the 
derivative D X T is given by 



1 + a cxp ( — ||x|| jb 



1 + 2&- 



1 + 



which is always positive. 

Proof. Elementary calculations show that 



D X T = 



1 + a cxp I — ||.t|| jb 



x ll m + 2ab- 1 cxp(-\\x\\ 2 /b) l + acxp(-||a;|! 2 /fr 



Since the determinate of I m + cxx' equals 1 + cx'x, the result follows. 



□ 



Lemma 17. For g defined above we have 



and 



lim g(0/( = 1 

C — >00 



lim (( 5 (C)/C) -1)C = 0. 
c — >0 ° 



Proof. It suffices to prove the second claim: 



lim ((. 9 (0/C) - 1) C = lim (g(0 - C) = Hm ( 5 (M0) " M0) 



lim U- [1 + a exp (-£>)] ^ 
hm £ [1 + a" 1 exp (^/h)]' 1 = 0. 

£ — >0 ° 



□ 



Proof (Verification of (fT5|) in Section 5). In view of Theorem [5] it sufhces to show 
that 

\fn(z) - fao(z)\dz -> 



in P ?l! ^-probability as n — > oo for every /3 € K fc where we recall that /oo is equal 
to /oo,oo: the density of an N(0, cr 2 <5 _1 )-distribution, if /3 2 7^ 0, and is equal to /oo 
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liven in (13]) if (3 2 = 0. Now, 



\,P,o [ / \fn(z) - foo(z)\ dz > e 



\fn(z) - feo(z)\dz > E, M = Mr 



\fn(z) ~ foo(z)\ dz >E,M = M V 



flA*) ~ foo{z) 



dz > e,M = Mj 



+ W n ,0,<r[ / \fl i00 (z)-f 00 (z)\dz>e,M = M L 



where we have made use of the definition of f n . If f3 £ Mr, then clearly the 
event M = Mjj has probability approaching zero and hence the last probability 
in the above display converges to zero. Furthermore, if j3 £ Mr, the last but one 
probability reduces to 



focfi( Z ) - /oc,00) 



dz > e,M = Mr 



which converges to zero since 



fL,o( z ) - /°°,o0) 



dz -> 



in view of pointwise convergence of to /oo.o an d Scheffe's lemma. [To be able 

to apply Scheffe's lemma we need to know that not only /oo.o but also Q (z) is a 
probability density. But this is obvious, as (|13[) defines a probability density for any 
symmetric and positive definite matrix Q.] The proof for the case where (3 £ Mjj 
is completely analogous noting that then /oo = /oo i00 holds. □ 
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